Deciding between man and zone coverage is one of the most critical strategic decisions a defensive coordinator must take before each offensive play in American football. While experienced offensive coordinators and quarterbacks often rely on visual cues to identify these defensive schemes, the increasing availability of player tracking data offers a new avenue to uncover these tactics. A notable example is Amazon’s NFL Next Gen Stats model, which delivers coverage predictions during live broadcasts (see snapshot below). However, pre-snap motion does not seem to play an accentuated role in this model (see Amazon), although it is a crucial element of modern offensive strategies.
Hence, our contribution explores the potential of incorporating pre-snap motion. While we similarly predict man- or zone coverage before motion, we further leverage the additional information of pre-snap player movements. Specifically, in addition to including rather naive post-motion features, we use an HMM to model defenders’ trajectories based on hidden states, which represent the offensive players they may be guarding. Incorporating summary statistics of the state decoding results as features into the existing models substantially improves the predictive ability. This lays the groundwork for further analyses such as the evaluation of the effectiveness of pre-snap motion in uncovering defensive strategies.
We aim to forecast the man- or zone coverage using the
pff_passCoverage indicator in play-by play data. We omit
plays tagged as others and plays with more than five offensive
linemen and with two quarterbacks. Since we are specifically interested
in analyzing pre-snap player movements, we concentrate on plays that
contain any pre-snap motion. Ultimately, we end up with \(3985\) offensive plays in total, from which
the defense played \(2973\) in zone and
\(1012\) in man coverage.
To accurately forecast the defensive scheme, we create various features derived from the tracking data. In particular, we conducted the following feature engineering steps: First, using all 11 players on each side, we compute the area spanned by the convex hull of a team and the largest \(y\) distance (i.e., the width of the hull) and the largest \(x\) distance (i.e., the length of the hull), i.e. 6 features. Then, we select the five most relevant players on each side of the field. For offense, we omit the offensive line and the QB and for defense, we disregard defense lineman (NT, DT, DE) and select the five defenders that were the closest to the five offensive players, corresponding to a weighted euclidean distance, putting much more emphasis on the y-axis. From these 10 players, we derive 20 features related to their (standardized) position.
Additionally, we extract information from the play-by-play data, such as quarter, down, yards to go, home and away score and the remaining seconds in the current half. See the Appendix for a more detailed description and a discussion on the choice of features.
We train different models to predict whether the defense plays a man- or zone coverage scheme. Since the aim of the project is to show the effectiveness of pre-snap motion, we follow a three-step approach:
In general, we have a limited dataset available (only 3985 plays) and therefore need to manage model complexity by controlling the number of features. Given the small dataset, we focus on the 32 previously described basic features.
Regarding a suitable basic model class for predicting man or zone
coverage, we opt for the following two: First, we fit a
glmnet (elastic net) model, which performs implicit feature
selection and can handle multicollinearity. Second, we use an
xgboost model, which additionally captures non-linear
effects (and interactions). For these models, we use 10-fold cross
validation on a suitable hyperparameter grid.
First, we fit the aforementioned models with the previously described basic pre-motion features. These very basic models serve as baseline models that allows to measure the effect of pre-snap motion (features) in the following.
Second, we extend our basic pre-motion model with naive post-motion features. To keep the complexity manageable, we derive only six additional post-motion features: for each team (offense and defense), we infer the maximum \(y\)-distance, the maximum \(x\)-distance and the total distance traveled by both teams until the snap.